Search CORE

23 research outputs found

Understanding and protecting closed-source systems through dynamic analysis

Author: Dolan-Gavitt Brendan
Publication venue: Georgia Institute of Technology
Publication date: 12/01/2015
Field of study

In this dissertation, we focus on dynamic analyses that examine the data handled by programs and operating systems in order to divine the undocumented constraints and implementation details that determine their behavior in the field. First, we introduce a novel technique for uncovering the constraints actually used in OS kernels to decide whether a given instance of a kernel data structure is valid. Next, we tackle the semantic gap problem in virtual machine security: we present a pair of systems that allow, on the one hand, automatic extraction of whole-system algorithms for collecting information about a running system, and, on the other, the rapid identification of “hook points” within a system or program where security tools can interpose to be notified of security-relevant events. Finally, we present and evaluate a new dynamic measure of code similarity that examines the content of the data handled by the code, rather than the syntactic structure of the code itself. This problem has implications both for understanding the capabilities of novel malware as well as understanding large binary code bases such as operating system kernels.Ph.D

Scholarly Materials And Research @ Georgia Tech

Recommended from our members

Repeatable Reverse Engineering for the Greater Good with PANDA

Author: Dolan-Gavitt Brendan F.
Hodosh Josh
Hulin Patrick
Leek Tim
Whelan Ryan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2014
Field of study

We present PANDA, an open-source tool that has been purpose-built to support whole system reverse engineering. It is built upon the QEMU whole system emulator, and so analyses have access to all code executing in the guest and all data. PANDA adds the ability to record and replay executions, enabling iterative, deep, whole system analyses. Further, the replay log files are compact and shareable, allowing for repeatable experiments. A nine billion instruction boot of FreeBSD, e.g., is represented by only a few hundred MB. Further, PANDA leverages QEMU's support of thirteen different CPU architectures to make analyses of those diverse instruction sets possible within the LLVM IR. In this way, PANDA can have a single dynamic taint analysis, for example, that precisely supports many CPUs. PANDA analyses are written in a simple plugin architecture which includes a mechanism to share functionality between plugins, increasing analysis code re-use and simplifying complex analysis development. We demonstrate PANDA's effectiveness via a number of use cases, including enabling an old but legitimate version of Starcraft to run despite a lost CD key, in-depth diagnosis of an Internet Explorer crash, and uncovering the censorship activities and mechanisms of a Chinese IM client

Columbia University Academic Commons

Adaptive Grey-Box Fuzz-Testing with Thompson Sampling

Author: Agrawal Shipra
David Robin
Dolan-Gavitt Brendan
Gascon Hugo
Rebert Alexandre
Stephens Nick
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 24/08/2018
Field of study

Fuzz testing, or "fuzzing," refers to a widely deployed class of techniques for testing programs by generating a set of inputs for the express purpose of finding bugs and identifying security flaws. Grey-box fuzzing, the most popular fuzzing strategy, combines light program instrumentation with a data driven process to generate new program inputs. In this work, we present a machine learning approach that builds on AFL, the preeminent grey-box fuzzer, by adaptively learning a probability distribution over its mutation operators on a program-specific basis. These operators, which are selected uniformly at random in AFL and mutational fuzzers in general, dictate how new inputs are generated, a core part of the fuzzer's efficacy. Our main contributions are two-fold: First, we show that a sampling distribution over mutation operators estimated from training programs can significantly improve performance of AFL. Second, we introduce a Thompson Sampling, bandit-based optimization approach that fine-tunes the mutator distribution adaptively, during the course of fuzzing an individual program. A set of experiments across complex programs demonstrates that tuning the mutational operator distribution generates sets of inputs that yield significantly higher code coverage and finds more crashes faster and more reliably than both baseline versions of AFL as well as other AFL-based learning approaches.Comment: Published as a workshop paper in the 11th ACM Workshop on Artificial Intelligence and Security (AISec '18) with the 25th ACM Conference on Computer and Communications Security (CCS '18

arXiv.org e-Print Archive

Crossref

VeriGen: A Large Language Model for Verilog Code Generation

Author: Ahmad Baleegh
Dolan-Gavitt Brendan
Garg Siddharth
Karri Ramesh
Pearce Hammond
Tan Benjamin
Thakur Shailja
Publication venue
Publication date: 27/07/2023
Field of study

In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by generating high-quality Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation.Comment: arXiv admin note: text overlap with arXiv:2212.1114

arXiv.org e-Print Archive

Differentially Testing Soundness and Precision of Program Analyzers

Author: Amato Gianluca
Besson Frédéric
Beyer Dirk
Blazy Sandrine
Bradley Aaron R.
Clarke Edmund M.
Cousot Patrick
de Moura Leonardo
Dolan-Gavitt Brendan
Donaldson Alastair F.
Dubois Catherine
Gange Graeme
Graf Susanne
Gurfinkel Arie
Heizmann Matthias
Komuravelli Anvesh
Midtgaard Jan
Publication venue
Publication date: 16/12/2018
Field of study

In the last decades, numerous program analyzers have been developed both by academia and industry. Despite their abundance however, there is currently no systematic way of comparing the effectiveness of different analyzers on arbitrary code. In this paper, we present the first automated technique for differentially testing soundness and precision of program analyzers. We used our technique to compare six mature, state-of-the art analyzers on tens of thousands of automatically generated benchmarks. Our technique detected soundness and precision issues in most analyzers, and we evaluated the implications of these issues to both designers and users of program analyzers

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Leveraging Forensic Tools for Virtual Machine Introspection

Author: Dolan-Gavitt Brendan
Lee Wenke
Payne Bryan
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2011
Field of study

Virtual machine introspection (VMI) has formed the basis of a number of novel approaches to security in recent years. Although the isolation provided by a virtualized environment provides improved security, software that makes use of VMI must overcome the semantic gap, reconstructing high-level state information from low-level data sources such as physical memory. The digital forensics community has likewise grappled with semantic gap problems in the field of forensic memory analysis (FMA), which seeks to extract forensically relevant information from dumps of physical memory. In this paper, we will show that work done by the forensic community is directly applicable to the VMI problem, and that by providing an interface between the two worlds, the difficulty of developing new virtualization security solutions can be significantly reduced

Scholarly Materials And Research @ Georgia Tech